EPA Superfund Sites

Week 29 Viz

Posted by David Velleca on September 21, 2018

Data Storytelling and Dataviz Approach

I took a bit more time with this viz so I could get it just right. The data is well suited to mapping by plotting the Superfund sites on a map, and also a timeline, showing the number of sites by year. I decided to focus on creating a map. I could have made a very simple map, but when I started looking at this dataset, I was curious about how many sites were within my area. So I decided to tackle this question.

Initially, I was a bit stuck trying to figure out my approach. The distance calculation using great circle routes is prominent on the web, with Tableau even having a knowledge base article that references it. However, a lot of the examples that I saw relied on data blending. The problem that I was finding with that approach is that the blend acts like an inner join. Alternate approaches join the dataset to itself, but in both instances, this means that the list of possible locales to calculate a distance from is limited to the sites that exist in the main data source (the EPA data in this case).

I really wanted to be able to figure out the number of sites within a set radius for any zip code, so the blending just wasn't going to work. Instead, I joined my EPA dataset to a supplemental zip code dataset with a right join on zip code. This meant that I brought in all of my EPA data, while also bringing in a row for every zip code even when I didn't have it in my EPA data. This meant that I was going to have to deal with some null values, but more on that later.

Now that I had my data, the next step was to calculate the distance. The calculation requires two latitudes and longitudes - in this case, one for the sites, and one for the chosen location. I created a parameter that included all the zip codes, and used this to drive calcs for [Chosen Lat] and [Chosen Long]. The catch here is that the Chosen Lat and Long have to be available regardless of the data in my EPA data. If I had just used a simple calc like IF [Zip Code Parameter] = STR([Zip]) THEN [Lat2] END, I would only have a Chosen Latitude if the Site's Zip was the same as the input parameter. This isn't any better than the solution that uses blending.

Here's the magic - using a level of detail calculation, I could force the data to be available regardless of the site's zip. The calculation is actually pretty straight forward - {FIXED : AVG(IF [Zip Code Parameter] = STR([Zip]) THEN [Lat2] END)}. I duplicated the calculated field, adjusted for Longitude, and I was ready to calculate the distance. I won't go into that logic here, but if you're interested, check out this site.

I then created a parameter that allows the user to set their desired distance radius and created a Set, with the a Condition by formula where [Distance] <= [Distance Parameter], used this as the filter and I was done. I polished it off with a summary of the chosen locale and how many sites were in the distance range.

I mentioned above that I would come back to the nulls. I actually made the nulls work to my advantage. I was able to use ISNULL([Epa Id]) in several of the calculations I used to create my tooltips, drive my colors, and shapes on the map. If you'd like to see what I mean, take a look at the workbook in Tableau.

A few caveats here, for the benefit of my GIS expert friend who suggested I do this the right way with GIS software... The distances are approximate; the accuracy is better when a smaller radius is chosen; the sites' locations are based on their zip codes, which have the latitude and longitude at their approximate centers. For what I'm doing though, it's close enough.

I really enjoyed the challenge this week, and learned some things in the process. If you'd like to see what you can do with this week's dataset, create your viz and post your work to Tableau Public and Twitter with the hashtag #ThrowbackDataThursday, tagging @TThrowbackThurs. We'd really love to see what you can come up with!

Data Source

This week's dataset comes from the EPA. Please be sure to cite the source on your viz.